智能论文笔记

Principal Geodesic Analysis of Merge Trees (and Persistence Diagrams)

Mathieu Pont , Jules Vidal , Julien Tierny

分类：计算机视觉 | 机器学习

2022-07-22

本文介绍了合并树木主要测量分析（MT-PGA）的计算框架，这是对著名的主要组件分析（PCA）框架[87]对合并树的瓦斯坦斯坦度量空间[92]的新颖调整。我们将MT-PGA计算作为一个约束优化问题，旨在调整正交测量轴的基础，同时最大程度地减少拟合能量。我们引入了一种有效的，迭代的算法，该算法利用了共享记忆并行性以及拟合能量梯度的分析表达，以确保快速迭代。我们的方法还琐碎地扩展到极值持久图。对公共集合的广泛实验证明了我们方法的效率 - 最大示例中的MT -PGA计算在分钟内进行了计算。我们通过扩展了两个典型的PCA应用程序来展示我们的贡献的实用性。首先，我们将MT-PGA应用于数据降低，并通过以MT-PGA为基础的第一批坐标来可靠地压缩合并树。其次，我们提出一个利用MT-PGA基础的前两个方向来生成合奏的二维布局，提出了一个维度降低框架。我们以持久性相关视图来增强这些布局，从而实现整体和局部视觉检查集合中的特征可变性。在这两种应用中，定量实验评估我们框架的相关性。最后，我们提供了轻巧的C ++实现，可用于复制我们的结果。

translated by 谷歌翻译

Discrete Morse Sandwich: Fast Computation of Persistence Diagrams for Scalar Data -- An Algorithm and A Benchmark

Pierre Guillou , Jules Vidal , Julien Tierny

分类：机器学习 | 计算机视觉

2022-06-27

本文介绍了用于持久图计算的有效算法，给定一个输入分段线性标量字段f在D上定义的d二维简单复杂k，并带有$ d \ leq 3 $。我们的方法通过引入三个主要加速度来扩展开创性的“ Paircells”算法。首先，我们在离散摩尔斯理论的设置中表达了该算法，该算法大大减少了要考虑的输入简单数量。其次，我们介绍了问题的分层方法，我们称之为“夹心”。具体而言，minima-saddle持久性对（$ d_0（f）$）和鞍 - 最大持久对（$ d_ {d-1}（f）$）是通过与Union-Find-Find-Find-Find-Find-Find-Find-Find-find-find-find-find-find-find-find-find-find-find-find-find-find of nourstable组的1个有效计算的。 - addles和（D-1）addles的稳定集。尺寸为0和（D-1）的快速处理进一步减少，并且大幅度降低了$ d_1（f）$，即三明治的中间层的计算$ d_1（f）$的关键简单数量。第三，我们通过共享记忆并行性记录了几个绩效改进。我们为可重复性目的提供了算法的开源实施。我们还贡献了一个可重复的基准软件包，该基准软件包利用了公共存储库中的三维数据，并将我们的算法与各种公开可用的实现进行了比较。广泛的实验表明，我们的算法提高了两个数量级，即它扩展的开创性“ Paircells”算法的时间性能。此外，它还改善了14种竞争方法的选择，改善了记忆足迹和时间性能，比最快的可用方法具有可观的增长，同时产生了严格的输出。我们通过应用于表面，音量数据和高维点云的持续性一维发电机的快速和稳健提取的应用来说明我们的贡献实用性。

translated by 谷歌翻译

Domain generalization of 3D semantic segmentation in autonomous driving

Jules Sanchez , Jean-Emmanuel Deschaud , Francois Goulette

分类：计算机视觉

2022-12-07

3D autonomous driving semantic segmentation using deep learning has become, a well-studied subject, providing methods that can reach very high performance. Nonetheless, because of the limited size of the training datasets, these models cannot see every type of object and scenes found in real-world applications. The ability to be reliable in these various unknown environments is called domain generalization. Despite its importance, domain generalization is relatively unexplored in the case of 3D autonomous driving semantic segmentation. To fill this gap, this paper presents the first benchmark for this application by testing state-of-the-art methods and discussing the difficulty of tackling LiDAR domain shifts. We also propose the first method designed to address this domain generalization, which we call 3DLabelProp. This method relies on leveraging the geometry and sequentiality of the LiDAR data to enhance its generalization performances by working on partially accumulated point clouds. It reaches a mIoU of 52.6% on SemanticPOSS while being trained only on SemanticKITTI, making it state-of-the-art method for generalization (+7.4% better than the second best method). The code for this method will be available on Github.

translated by 谷歌翻译

On Utilizing Relationships for Transferable Few-Shot Fine-Grained Object Detection

Ambar Pal , Arnau Ramisa , Amit Kumar K C , René Vidal

分类：计算机视觉 | 人工智能

2022-12-01

State-of-the-art object detectors are fast and accurate, but they require a large amount of well annotated training data to obtain good performance. However, obtaining a large amount of training annotations specific to a particular task, i.e., fine-grained annotations, is costly in practice. In contrast, obtaining common-sense relationships from text, e.g., "a table-lamp is a lamp that sits on top of a table", is much easier. Additionally, common-sense relationships like "on-top-of" are easy to annotate in a task-agnostic fashion. In this paper, we propose a probabilistic model that uses such relational knowledge to transform an off-the-shelf detector of coarse object categories (e.g., "table", "lamp") into a detector of fine-grained categories (e.g., "table-lamp"). We demonstrate that our method, RelDetect, achieves performance competitive to finetuning based state-of-the-art object detector baselines when an extremely low amount of fine-grained annotations is available ($0.2\%$ of entire dataset). We also demonstrate that RelDetect is able to utilize the inherent transferability of relationship information to obtain a better performance ($+5$ mAP points) than the above baselines on an unseen dataset (zero-shot transfer). In summary, we demonstrate the power of using relationships for object detection on datasets where fine-grained object categories can be linked to coarse-grained categories via suitable relationships.

translated by 谷歌翻译

Taming Hyperparameter Tuning in Continuous Normalizing Flows Using the JKO Scheme

Alexander Vidal , Samy Wu Fung , Luis Tenorio , Stanley Osher , Levon Nurbekyan

分类：机器学习

2022-11-30

A normalizing flow (NF) is a mapping that transforms a chosen probability distribution to a normal distribution. Such flows are a common technique used for data generation and density estimation in machine learning and data science. The density estimate obtained with a NF requires a change of variables formula that involves the computation of the Jacobian determinant of the NF transformation. In order to tractably compute this determinant, continuous normalizing flows (CNF) estimate the mapping and its Jacobian determinant using a neural ODE. Optimal transport (OT) theory has been successfully used to assist in finding CNFs by formulating them as OT problems with a soft penalty for enforcing the standard normal distribution as a target measure. A drawback of OT-based CNFs is the addition of a hyperparameter, $\alpha$, that controls the strength of the soft penalty and requires significant tuning. We present JKO-Flow, an algorithm to solve OT-based CNF without the need of tuning $\alpha$. This is achieved by integrating the OT CNF framework into a Wasserstein gradient flow framework, also known as the JKO scheme. Instead of tuning $\alpha$, we repeatedly solve the optimization problem for a fixed $\alpha$ effectively performing a JKO update with a time-step $\alpha$. Hence we obtain a "divide and conquer" algorithm by repeatedly solving simpler problems instead of solving a potentially harder problem with large $\alpha$.

translated by 谷歌翻译

Encoder-Decoder Model for Suffix Prediction in Predictive Monitoring

Efrén Rama-Maneiro , Pablo Monteagudo-Lago , Juan C. Vidal , Manuel Lama

分类：机器学习 | 人工智能

2022-11-29

Predictive monitoring is a subfield of process mining that aims to predict how a running case will unfold in the future. One of its main challenges is forecasting the sequence of activities that will occur from a given point in time -- suffix prediction -- . Most approaches to the suffix prediction problem learn to predict the suffix by learning how to predict the next activity only, not learning from the whole suffix during the training phase. This paper proposes a novel architecture based on an encoder-decoder model with an attention mechanism that decouples the representation learning of the prefixes from the inference phase, predicting only the activities of the suffix. During the inference phase, this architecture is extended with a heuristic search algorithm that improves the selection of the activity for each index of the suffix. Our approach has been tested using 12 public event logs against 6 different state-of-the-art proposals, showing that it significantly outperforms these proposals.

translated by 谷歌翻译

Facial Tic Detection in Untrimmed Videos of Tourette Syndrome Patients

Yutao Tang , Benjamín Béjar , Joey K. -Y. Essoe , Joseph F. McGuire , René Vidal

分类：计算机视觉

2022-11-07

Tourette Syndrome (TS) is a behavior disorder that onsets in childhood and is characterized by the expression of involuntary movements and sounds commonly referred to as tics. Behavioral therapy is the first-line treatment for patients with TS, and it helps patients raise awareness about tic occurrence as well as develop tic inhibition strategies. However, the limited availability of therapists and the difficulties for in-home follow up work limits its effectiveness. An automatic tic detection system that is easy to deploy could alleviate the difficulties of home-therapy by providing feedback to the patients while exercising tic awareness. In this work, we propose a novel architecture (T-Net) for automatic tic detection and classification from untrimmed videos. T-Net combines temporal detection and segmentation and operates on features that are interpretable to a clinician. We compare T-Net to several state-of-the-art systems working on deep features extracted from the raw videos and T-Net achieves comparable performance in terms of average precision while relying on interpretable features needed in clinical practice.

translated by 谷歌翻译

Bilevel Optimization for Feature Selection in the Data-Driven Newsvendor Problem

Breno Serrano , Stefan Minner , Maximilian Schiffer , Thibaut Vidal

分类：机器学习

2022-09-12

我们研究了基于功能的新闻企业问题，其中决策者可以访问包括需求观察和外源特征组成的历史数据。在这种情况下，我们研究了功能选择，旨在得出具有改进样本外部性能的稀疏，可解释的模型。到目前为止，最新的方法利用正则化，这会惩罚所选特征的数量或解决方案向量的规范。作为替代方案，我们介绍了一种新型的双层编程公式。高级问题选择了一部分功能，这些功能将基于固定验证集的订购决策的样本外成本估算最小化。下层问题仅使用上层选择的功能，了解训练集中决策功能的最佳系数。我们为Bilevel程序提供了混合整数线性程序重新制定，可以通过标准优化求解器求解为最佳性。我们的计算实验表明，该方法准确地恢复了几百个观察结果的实例中的基础真相。相反，基于正则化的技术通常在功能恢复时失败，或者需要数千个观察值才能获得相似的准确性。关于样本外的概括，我们实现了改进或可比的成本绩效。

translated by 谷歌翻译

Kernel Biclustering algorithm in Hilbert Spaces

Marcos Matabuena , J. C Vidal , Oscar Hernan Madrid Padilla , Dino Sejdinovic

分类： (统计)机器学习

2022-08-07

双簇算法分区数据并同时协变量，提供了几个领域的新见解，例如分析基因表达以发现新的生物学功能。本文使用能量距离（ED）和最大平均差异（MMD）的概念在抽象空间中开发了一种新的无模型双簇算法 - 能够处理复杂数据（例如曲线或图形）的概率分布之间的两个距离。所提出的方法比大多数现有文献方法都可以学习更多的通用和复杂的群集形状，这些方法通常着重于检测均值和方差差异。尽管我们的方法的两次簇配置受到限制，以在基准和协变量级别创建不相交结构，但结果是竞争性的。我们的结果与最佳场景中的最新方法相似，假设有适当的内核选择，当群集差异集中在高阶矩中时，它们的表现优于它们。该模型的性能已在涉及模拟和现实世界数据集的几种情况下进行了测试。最后，使用最佳运输理论的一些工具确定了新的理论一致性结果。

translated by 谷歌翻译

Multi-Task Transformer with uncertainty modelling for Face Based Affective Computing

Gauthier Tallec , Jules Bonnard , Arnaud Dapogny , Kévin Bailly

分类：计算机视觉

2022-08-06

基于面部的情感计算包括检测面部图像的情绪。它可以更好地自动理解人类行为是有用的，并且可以为改善人机相互作用铺平道路。但是，它涉及设计情绪的计算表示的挑战。到目前为止，情绪已经在2D价/唤醒空间中连续地表示，或者以Ekman的7种基本情绪为单位。另外，Ekman的面部动作单元（AU）系统也已被用来使用单一肌肉激活的代码手册来粘附情绪。 ABAW3和ABAW4多任务挑战是第一项提供用这三种标签注释的大规模数据库的工作。在本文中，我们提出了一种基于变压器的多任务方法，用于共同学习以预测唤醒，动作单位和基本情绪。从体系结构的角度来看，我们的方法使用任务的令牌方法来有效地建模任务之间的相似性。从学习的角度来看，我们使用不确定性加权损失来建模三个任务注释之间的随机性差异。

translated by 谷歌翻译